{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "FmqC2wndOfI5"
   },
   "source": [
    "# **Sentiment Analysis: Sparse Transfer Learning with the Python API**\n",
    "\n",
    "In this example, you will fine-tune a 90% pruned BERT model onto the Rotten Tomatoes dataset with a custom distillation teacher model using SparseML's Hugging Face Integration.\n",
    "\n",
    "### **Sparse Transfer Learning Overview**\n",
    "\n",
    "Sparse Transfer Learning is very similiar to the typical transfer learning process used to train NLP models, where we fine-tune a pretrained checkpoint onto a smaller downstream dataset. With Sparse Transfer Learning, however, we simply start the training process from a pre-sparsified checkpoint and maintain sparsity while the fine-tuning occurs.\n",
    "\n",
    "At the end, you will have a sparse model trained on your dataset, ready to be deployed with DeepSparse for GPU-class performance on CPUs!\n",
    "\n",
    "### **Pre-Sparsified BERT**\n",
    "SparseZoo, Neural Magic's open source repository of pre-sparsified models, contains a 90% pruned version of BERT, which has been sparsified on the upstream Wikipedia and BookCorpus datasets with the\n",
    "masked language modeling objective. [Check out the model card](https://sparsezoo.neuralmagic.com/models/nlp%2Fmasked_language_modeling%2Fobert-base%2Fpytorch%2Fhuggingface%2Fwikipedia_bookcorpus%2Fpruned90-none). We will use this model as the starting point for the transfer learning process.\n",
    "\n",
    "\n",
    "***Let's dive in!***"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "AkR1u2_NnXqY"
   },
   "outputs": [],
   "source": [
    "!pip install sparseml[transformers]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "G4wyea9PP87u"
   },
   "source": [
    "If you are running on Google Colab, restart the runtime after this step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "s199jAjIQLBJ"
   },
   "outputs": [],
   "source": [
    "import sparseml\n",
    "from sparsezoo import Model\n",
    "from sparseml.transformers.utils import SparseAutoModel\n",
    "from sparseml.transformers.sparsification import Trainer, TrainingArguments\n",
    "import numpy as np\n",
    "from transformers import (\n",
    "    AutoModelForSequenceClassification,\n",
    "    AutoConfig, \n",
    "    AutoTokenizer, \n",
    "    EvalPrediction, \n",
    "    default_data_collator\n",
    ")\n",
    "from datasets import load_dataset, load_metric"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "2rS2Q5kxFcW3"
   },
   "source": [
    "## **Step 1: Load a Dataset**\n",
    "\n",
    "SparseML is integrated with Hugging Face, so we can use the `datasets` class to load datasets from the Hugging Face hub or from local files. \n",
    "\n",
    "[Rotten Tomatoes Dataset Card](https://huggingface.co/datasets/rotten_tomatoes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "nT8RoT-yGFxy"
   },
   "outputs": [],
   "source": [
    "# load dataset natively\n",
    "dataset = load_dataset(\"rotten_tomatoes\")\n",
    "\n",
    "# alternatively, save to save to csv and reload as example\n",
    "dataset[\"train\"].to_csv(\"rotten_tomatoes-train.csv\")\n",
    "dataset[\"validation\"].to_csv(\"rotten_tomatoes-validation.csv\")\n",
    "data_files = {\n",
    "  \"train\": \"rotten_tomatoes-train.csv\",\n",
    "  \"validation\": \"rotten_tomatoes-validation.csv\"\n",
    "}\n",
    "dataset_from_json = load_dataset(\"csv\", data_files=data_files)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "aB8nezNJQ9Rz"
   },
   "outputs": [],
   "source": [
    "print(dataset_from_json)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "_5kKXKmHGrQm"
   },
   "outputs": [],
   "source": [
    "!head rotten_tomatoes-train.csv --lines=5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "fGw671iuRHo3"
   },
   "outputs": [],
   "source": [
    "# configs for below\n",
    "INPUT_COL_1 = \"text\"\n",
    "INPUT_COL_2 = None\n",
    "LABEL_COL = \"label\"\n",
    "NUM_LABELS = len(dataset_from_json[\"train\"].unique(LABEL_COL))\n",
    "print(NUM_LABELS)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3BfXUE9HHFoq"
   },
   "source": [
    "## **Step 2: Setup Evaluation Metric**\n",
    "\n",
    "Sentiment analysis is a single sequence binary classification problem. We will use the `accuracy` function as the evaluation metric. \n",
    "\n",
    "Since SparseML is integrated with Hugging Face, we can use the native Hugging Face `compute_metrics` for evaluation (which will be passed to the `Trainer` class below)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "PL8HbrQzHRCF"
   },
   "outputs": [],
   "source": [
    "metric = load_metric(\"accuracy\")\n",
    "\n",
    "def compute_metrics(p: EvalPrediction):\n",
    "  preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions\n",
    "  preds = np.argmax(preds, axis=1)\n",
    "  result = metric.compute(predictions=preds, references=p.label_ids)\n",
    "  if len(result) > 1:\n",
    "      result[\"combined_score\"] = np.mean(list(result.values())).item()\n",
    "  return result"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "3AQXOsNOQh22"
   },
   "source": [
    "## **Step 3: Download Files for Sparse Transfer Learning**\n",
    "\n",
    "First, we need to select a sparse checkpoint to begin the training process. In this case, we will fine-tune a 90% pruned version of BERT onto the Rotten Tomatoes dataset. This model is available in SparseZoo, identified by the following stub:\n",
    "```\n",
    "zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none\n",
    "```\n",
    "\n",
    "Next, we need to create a sparsification recipe for usage in the training process. Recipes are YAML files that encode the sparsity related algorithms and parameters to be applied by SparseML. For Sparse Transfer Learning, we need to use a recipe that instructs SparseML to maintain sparsity during the training process and to apply quantization over the final few epochs.  In SparseZoo, there is a transfer recipe which was used to fine-tune BERT onto the SST2 task. Since Rotten Tomatoes is a similiar problem to SST2, we will use the SST2 recipe, which is identified by the following stub:\n",
    "```\n",
    "zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none\n",
    "```\n",
    "\n",
    "Finally, SparseML has the optional ability to apply model distillation from a teacher model during the transfer learning process to boost accuracy. Since SparseML is integrated with Hugging Face, we can use a model from the Hugging Face hub. We will use BERT-base trained by textattack on rotten tomatoes as the teacher ([Model Card](https://huggingface.co/textattack/bert-base-uncased-rotten-tomatoes)). It is identified by the following:\n",
    "\n",
    "```\n",
    "textattack/bert-base-uncased-rotten-tomatoes\n",
    "```\n",
    "\n",
    "Use the `sparsezoo` python client to download the models and recipe using their SparseZoo stubs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "-P-Edh9gSG-X"
   },
   "outputs": [],
   "source": [
    "# downloads 90% pruned upstream BERT trained on MLM objective\n",
    "model_stub = \"zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none\" \n",
    "model_path = Model(model_stub, download_path=\"./model\").training.path \n",
    "\n",
    "# downloads transfer recipe for MNLI (pruned90_quant)\n",
    "transfer_stub = \"zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none\"\n",
    "recipe_path = Model(transfer_stub, download_path=\"./transfer_recipe\").recipes.default.path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "trWVl9FASdgh"
   },
   "outputs": [],
   "source": [
    "# https://huggingface.co/textattack/bert-base-uncased-rotten-tomatoes\n",
    "# this is a model from the huggingface hub, trained on rotten tomatoes\n",
    "teacher_path = \"textattack/bert-base-uncased-rotten-tomatoes\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RLe8iEWxV_zz"
   },
   "source": [
    "We can see that the upstream model (trained on Wikipedia BookCorpus) and  configuration files have been downloaded to the local directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "0NTVj1kPRSCW"
   },
   "outputs": [],
   "source": [
    "%ls ./model/training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Inspecting the Recipe\n",
    "\n",
    "Here is the transfer learning recipe:\n",
    "\n",
    "```yaml\n",
    "version: 1.1.0\n",
    "\n",
    "# General Variables\n",
    "num_epochs: &num_epochs 13\n",
    "init_lr: 1.5e-4\n",
    "final_lr: 0\n",
    "\n",
    "qat_start_epoch: &qat_start_epoch 8.0\n",
    "observer_epoch: &observer_epoch 12.0\n",
    "quantize_embeddings: &quantize_embeddings 1\n",
    "\n",
    "distill_hardness: &distill_hardness 1.0\n",
    "distill_temperature: &distill_temperature 2.0\n",
    "\n",
    "weight_decay: 0.01\n",
    "\n",
    "# Modifiers:\n",
    "\n",
    "training_modifiers:\n",
    "  - !EpochRangeModifier\n",
    "      end_epoch: eval(num_epochs)\n",
    "      start_epoch: 0.0\n",
    "  - !LearningRateFunctionModifier\n",
    "      start_epoch: 0\n",
    "      end_epoch: eval(num_epochs)\n",
    "      lr_func: linear\n",
    "      init_lr: eval(init_lr)\n",
    "      final_lr: eval(final_lr)\n",
    "\n",
    "quantization_modifiers:\n",
    "  - !QuantizationModifier\n",
    "      start_epoch: eval(qat_start_epoch)\n",
    "      disable_quantization_observer_epoch: eval(observer_epoch)\n",
    "      freeze_bn_stats_epoch: eval(observer_epoch)\n",
    "      quantize_embeddings: eval(quantize_embeddings)\n",
    "      quantize_linear_activations: 0\n",
    "      exclude_module_types: ['LayerNorm', 'Tanh']\n",
    "      submodules:\n",
    "        - bert.embeddings\n",
    "        - bert.encoder\n",
    "        - bert.pooler\n",
    "        - classifier\n",
    "\n",
    "\n",
    "distillation_modifiers:\n",
    "  - !DistillationModifier\n",
    "     hardness: eval(distill_hardness)\n",
    "     temperature: eval(distill_temperature)\n",
    "     distill_output_keys: [logits]\n",
    "\n",
    "constant_modifiers:\n",
    "  - !ConstantPruningModifier\n",
    "      start_epoch: 0.0\n",
    "      params: __ALL_PRUNABLE__\n",
    "\n",
    "regularization_modifiers:\n",
    "  - !SetWeightDecayModifier\n",
    "      start_epoch: 0.0\n",
    "      weight_decay: eval(weight_decay)\n",
    "```\n",
    "\n",
    "\n",
    "The `Modifiers` in the transfer learning recipe are the important items that encode how SparseML should modify the training process for Sparse Transfer Learning:\n",
    "- `ConstantPruningModifier` tells SparseML to pin weights at 0 over all epochs, maintaining the sparsity structure of the network\n",
    "- `QuantizationModifier` tells SparseML to quanitze the weights with quantization aware training over the last 5 epochs\n",
    "- `DistillationModifier` tells SparseML how to apply distillation during the trainign process, targeting the logits\n",
    "\n",
    "Below, SparseML's `Trainer` will parses the modifiers and updates the training process to implement the algorithms specified here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "FStnDScEKoMX"
   },
   "source": [
    "## **Step 4: Setup Hugging Face Model Objects**\n",
    "\n",
    "Next, we will set up the Hugging Face `tokenizer, config, and model`. \n",
    "\n",
    "These are all native Hugging Face objects, so check out the Hugging Face docs for more details on `AutoModel`, `AutoConfig`, and `AutoTokenizer` as needed. \n",
    "\n",
    "We instantiate these classes by passing the local path to the directory containing the `pytorch_model.bin`, `tokenizer.json`, and `config.json` files from the SparseZoo download."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "8kmlE1PdB2nB"
   },
   "outputs": [],
   "source": [
    "# we can use a shared tokenizer since both are BERT\n",
    "# see examples for using a separate tokenizer\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_path)\n",
    "\n",
    "# setup model configs\n",
    "model_config = AutoConfig.from_pretrained(model_path, num_labels=NUM_LABELS)\n",
    "teacher_config = AutoConfig.from_pretrained(teacher_path, num_labels=NUM_LABELS)\n",
    "\n",
    "# initialize model using familiar HF AutoModel\n",
    "model_kwargs = {\"config\": model_config}\n",
    "model_kwargs[\"state_dict\"], s_delayed = SparseAutoModel._loadable_state_dict(model_path)\n",
    "model = AutoModelForSequenceClassification.from_pretrained(model_path,**model_kwargs,)\n",
    "\n",
    "# initialize teacher using familiar HF AutoModel\n",
    "teacher_kwargs = {\"config\": teacher_config}\n",
    "teacher_kwargs[\"state_dict\"], t_delayed = SparseAutoModel._loadable_state_dict(teacher_path)\n",
    "teacher = AutoModelForSequenceClassification.from_pretrained(teacher_path,**teacher_kwargs,)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "K1JSDkCdMghS"
   },
   "source": [
    "## **Step 5: Tokenize Dataset**\n",
    "\n",
    "Run the tokenizer on the dataset. This is standard Hugging Face functionality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "x3v3WFrHFLoO"
   },
   "outputs": [],
   "source": [
    "MAX_LEN = 128\n",
    "def preprocess_fn(examples):\n",
    "  args = None\n",
    "  if INPUT_COL_2 is None:\n",
    "    args = (examples[INPUT_COL_1], )\n",
    "  else:\n",
    "    args = (examples[INPUT_COL_1], examples[INPUT_COL_2])\n",
    "  result = tokenizer(*args, \n",
    "                   padding=\"max_length\", \n",
    "                   max_length=min(tokenizer.model_max_length, MAX_LEN), \n",
    "                   truncation=True)\n",
    "  return result\n",
    "\n",
    "tokenized_dataset = dataset_from_json.map(\n",
    "    preprocess_fn,\n",
    "    batched=True,\n",
    "    desc=\"Running tokenizer on dataset\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "19mnPsKHN_y1"
   },
   "source": [
    "## **Step 6: Run Training**\n",
    "\n",
    "SparseML has a custom `Trainer` class that inherits from the [Hugging Face `Trainer` Class](https://huggingface.co/docs/transformers/main_classes/trainer). As such, the SparseML `Trainer` has all of the existing functionality of the HF trainer. However, in addition, we can supply a `recipe` and (optionally) a `teacher`. \n",
    "\n",
    "\n",
    "As we saw above, the `recipe` encodes the sparsity related algorithms and hyperparameters of the training process in a YAML file. The SparseML `Trainer` parses the `recipe` and adjusts the training workflow to apply the algorithms in the recipe.\n",
    "\n",
    "The `teacher` is an optional argument that instructs SparseML to apply model distillation to support the training process. Here, we pass the `teacher` model with downloaded from the Hugging Face hub."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "HJeTlf5R8npQ"
   },
   "outputs": [],
   "source": [
    "training_args = TrainingArguments(\n",
    "    output_dir=\"./training_output\",\n",
    "    do_train=True,\n",
    "    do_eval=True,\n",
    "    resume_from_checkpoint=False,\n",
    "    evaluation_strategy=\"epoch\",\n",
    "    save_strategy=\"epoch\",\n",
    "    logging_strategy=\"epoch\",\n",
    "    save_total_limit=1,\n",
    "    per_device_train_batch_size=32,\n",
    "    per_device_eval_batch_size=32,\n",
    "    fp16=True)\n",
    "\n",
    "trainer = Trainer(\n",
    "    model=model,\n",
    "    model_state_path=model_path,\n",
    "    recipe=recipe_path,\n",
    "    teacher=teacher,\n",
    "    metadata_args=[\"per_device_train_batch_size\",\"per_device_eval_batch_size\",\"fp16\"],\n",
    "    args=training_args,\n",
    "    train_dataset=tokenized_dataset[\"train\"],\n",
    "    eval_dataset=tokenized_dataset[\"validation\"],\n",
    "    tokenizer=tokenizer,\n",
    "    data_collator=default_data_collator,\n",
    "    compute_metrics=compute_metrics)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "bNrip0sYifOE"
   },
   "outputs": [],
   "source": [
    "train_result = trainer.train(resume_from_checkpoint=False)\n",
    "trainer.save_model()  # Saves the tokenizer too for easy upload\n",
    "trainer.save_state()\n",
    "trainer.save_optimizer_and_scheduler(training_args.output_dir)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **Step 7: Export To ONNX**\n",
    "\n",
    "Run the following to export the model to ONNX. The script creates a `deployment` folder containing ONNX file and the necessary configuration files (e.g. `tokenizer.json`) for deployment with DeepSparse."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "-rhWjiHBeR7M"
   },
   "outputs": [],
   "source": [
    "!sparseml.transformers.export_onnx \\\n",
    "  --model_path training_output \\\n",
    "  --task text_classification"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **Next Steps**\n",
    "\n",
    "Checkout the DeepSparse repository for more details on deploying your sparse models with GPU class performance on CPUs!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "authorship_tag": "ABX9TyNEjp5SX1YeYx+sAruCDPsu",
   "provenance": [
    {
     "file_id": "1i07Wcle4yXpC9kyWtzNtwF4v-4kwfZoT",
     "timestamp": 1677204992942
    },
    {
     "file_id": "1Zawa0sifXr2wIl9tbF7ySJ7xYY0dtTzI",
     "timestamp": 1677193660159
    }
   ]
  },
  "gpuClass": "standard",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  },
  "vscode": {
   "interpreter": {
    "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
